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The Impact of Aggregation Bias Upon the 
Interpretation of Test Scores Across Schools 

It is often necessary and convenient to use aggregate data concerning 
questions that are to a large extent about individuals. Research in public 
policy, education, demography, economics and sociology has often focused on the 
proper specification of models that rely on aggregate level data. The question 
of inferences that can be made about microlevel relationships from macrolevel 
data have been debated. The solution to this problem has been subject to con- 
tinuous refinements of techniques designed to mitigate the problems of aggregate 
data. 

This paper focuses on inferences made from the group level to the indivi- 
dual level. Since it is not possible to divorce substantive problems of model 
formation from methodological questions concerning technique, the example of 
making inferences about test scores across schools and colleges will be used to 
explore questions about aggregation bias. 

Previous research on analyzing grouped or aggregate data has followed two 
separate paths of development. One perspective is represented by the seminal 
work of Robinson (1950) in sociology, while another is represented by the work 
of Prais and Aitchison (1954) in economics. Robinson's ecological correlation 
approach and the grouping data approach of Prais and Aitchison are complemen- 
tary. 

Analysis of Variance Approach to Aggregation 

The analysis of covariance* method to grouping grows out of Robinson's 
approach to aggregation. The analysis of covariance method is illustrated by 
partitioning the sums of squares about the mean for the dependent variable into 
explained sums of squares due to the covariates and to the groups along with the 
residual sums of squares. 



Following the notation of Johnston (1972:192-207) a simple model is defined as 

y = X + u (1) 
Where the sample y is a column vector (n x 1) of micro-level observations com- 
posed of p sub-vectors— i.e., the groups. The independent variables are the X 
matrix (n x k) divided into p groups and the first column is all ones to allow a 
constant term, while B is a vector (k x 1) of the estimators. The vector u con- 
tains stochastic noise values where E(u) = 0. To incorporate the possible 
effect of the p groups, then an expanded model is 

y = Doc + XB + u, (2) 
which allows the p groups to have different constant terms, thus <* is a vector 

of (p - 1) elements. The D' matrix is of dummy variables with order 

P 

(Mp x [p-1]), where M = t Zm; is the sum of the number of observations in each p, 



for instance: 
D 1 = 



1 1 0 0 0 0 0 0 0 
0011P000 
0 0 0 0 1 1 1 1 1 



(3) 



Remembering that D has p groups, with each p having m elements. To estimate (1) 
above, start with 

y = XB + s, (4) 
which can be estimated by 

B = (X X^-lx'y, (5) 
where s gives the least square residuals. An additional relationship may be 
derived as„ 



y'y = ti'X'y + s's 



Returning to (2) above, the estimation of 
y - D<* + XB + e 



becomes 

A 
A 

d 

ft = 



D'D 
X'D 



D'X 
X'X 



■1 D'y 
X'y 



(6) 



(7) 



(8) 



The Generalized Least-Squares Approach to Aggregation 
In this section, if we start with (1) of the prevH;;: section the grouping 
of observations into p groups and taking means yields 
y = XB~ = u 

Then the ungrouped data are related to the aggregated ' 
y = Gy 
X = GX 
u = Gu 



ston, pp. 228-241): 

(12) 

forms, 

(13) 
(14) 
(15) 



with G as the grouping matrix of (m x n). The form of G is_ for instance, 

1/1 1/1 0 0 0 0 0 ... 0 
0 0 £ £ £ £ £ . . . 0 
0 0 0 0 0 0 0 . . .1/p 



(16) 



While E(u) = 0 it is also noted that 

E(uu ! ) = d 2 I (17) 
which mec.ns that the estimators will be unbiased but inefficient. However, it 
is the case that 

E(uu' ) = d 2 GG' (18) 
which is efficient. To estimated B, the generalized least squares is 

b - [X'tGO-lXD-lX'tGO-ly (ig) 

and 

var(b) = d 2 [X'(GG , )- 1 X]-l (20) 
Here, generalized least squares overcomes the heteroscedastic problem (17) by 
inserting the grouping factor G in (18). The expression (GG') _1 is actually a 
weighting matrix which contains the numbers in each group. Note that the 
generalized least squares estimates are not as efficient as the ungrouped ones. 

The Effect of Aggregation on Interpreting Test Scores 

Of substantive interest is the policy question of the relationship of 
test scores and academic background with performance in college level work. 



The availability of grouped data by colleges, especially in terms of historical 
data, has made meta-analysis possible. This section seeks to explore the impact 
of grouping by college upon the relationship of college performance as measured 
by grade point average (GPA) with Scholastic Aptitude Test - Verbal (SAT-V) and 
Scholastic Aptitude Test - Mathematics (SAT-M) scores, and high school average 
(HSA). Data to investigate these questions were obtained for a large state 
college system with over 30 individual colleges. Over 45,000 undergraduate stu- 
dents who had one or more academic terms are included. As a control for college 
experience, the number of credit hours attempted and earned were employed. 

Often when test scores are used as indicators of academic performance, 
questions of fairness to ethnic, racial, and gender groups have been raised. In 
this paper, the analysis of the influence of aggregation will be done by gender 
and minority groups status. 

To test for the impact of aggregation on the relation of GPA with SAT 
scores, HSA, and credit hours, an analysis of variance was oerformed where the 
groups were the colleges and the test scores, HSA, and credit hours were the 
covariates. The partitioned sums of squares (SS) for this analysis are pre- 
sented in Table 1. These show that the covariants are more strongly related to 
GPA for white females (WF) and white males (WM) than for black students of 
either sex. However, the R 2 s for black females (BF) and black males (BM) are 
moderately strong for this type of data. The SS associated with the grouping, 
i.e. college effect shows that the college is more important for the BF and BM 
groups than for the WM and WF groups, accounting for more than 10% of the SS. 
Colleges account for 6% of the variables for the WF and WM groups. Clearly, 
colleges as a grouping factor have an independent influence on GPA even when 
controlling for test scores and academic background. College is an independent 
grouping factor in no sense like a random grouping. 



Given that college as a grouping factor has a clear impact on GPA, it is 
interesting to review the regressions of the covariates with GPA to investigate 
if there is any aggregation bias present between the individual and college 
level. These regressions are presented in Table 2. As would be expected, the 
R 2 for the college level is somewhat higher than the individual level. Of the 
five regressions, the regression coefficient for SAT-V scores is negative at the 
college level in four cases. For WF, the coefficient for SAT-M is negative at 
the college level. These negative coefficients at the college level, when the 
individual level coefficients are positive, are strongly indicative of the bias 
introduced by grouping. The decline in magnitude of the HSA coefficients for 
four of the five regressions is an additional substantive outcome of the 
grouping factor. It is clear that the regression coefficients are markedly 
changed when group data is employed. Thus the typical measures of importance in 
the regression analysis were altered considerably by the aggregation effect. 
Anyone hoping to ascertain individual relationships from aggregate data of this 
type should expect little likelihood of success. 

Summary and Discussion 

The purpose of. this paper has been to investigate the problem of the impact 
of qrouping on measuring academic performance. This problem was explored using 
the- analysis of covariance approach and is related to the clustering approach of 
generalized least squares. To illustrate this question, actual data on academic 
performance at the individual and grouped (college) level was explored. It was 
found that the individual level relationships and the college level rela- 
tionships were generally quite different, with regression coefficients often 
having different signs. Based upon this research, it would seem to be generally 
inappropriate to use grouped data to investigate academic performance across 
colleges. 



Table 1 

Sims of Squares of GPA for Analysis of Covariance 



Source 
of 

Variation Suns of Squares 





BF 


R2 


BM 


R2 


Vf 


R2 


VM 


R2 


Total 


R2 


Covariates 


470 


.17 


287 


.19 


3379 


.34 


3178 


.37 


8093 


.33 


Colleges 


327 


.12 


158 


.10 


576 


.06 


496 


.06 


1335 


.05 


Residual 


1911 




1085 




6025 




4977 




14,988 




Total 


2708 




1530 




9980 




8651 




24,416 




N 


9931 




3308 




18,771 




16,765 




45,475 
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Table 2 

Regressions for Individuals and Colleges of GPA 



Independent 

Variables BF BM W W Total 

Individual Colleges Individual Colleges individual Colleges Individual Colleges Individual C 



SAT Verbal 


rem 

i\JMQ 




•LUJj 




.uuiy 


.0056 


.0012 


-.0036 


.0014 


SAT Math 

Jnl Haul 




mis 


.LUJO 


nnoo 
MJco 


.0004 


-.0042 


.0008 


.0040 


.0005 


n.j. /werage 




m oo 


.2312 


.2647 


.3709 


.1355 


.3066 


.0017 


.3497 


Hours Atteipted 


-.0059 


.0007 


-.0120 


-.0005 


-.0115 


.0028 


-.0168 


-.0020 


-.0121 


Hours Earned 


.0081 


.0053 


.0143 


.0029 


.0131 


-.0006 


.0195 


.0015 


.0143 


Constant 


.9840 


2.1756 


1.1164 


1.6248 


.3756 


1.4998 


.4937 


1.9899 


.5589 


Inverse College 




.3215 




.0563 




1.8373 




4.0176 




Size 




















N 


5931 


33 


3308 


32 


18,771 


33 


16,765 


33 


45,475 


R2 


.17 


.41 


.19 


.22 


.34 


.63 


.37 


.56 


.33 


S«E«E » 


+.61 


+.18 


+.61 


+.22 


+.59 


+.16 


+.57 


+.16 


+.60 
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